Data transmission method and calculation apparatus for neural network, electronic apparatus, computer-raedable storage medium and computer program product

ABSTRACT

Provided are a data transmission method for a neural network, and a related product. The method includes the following steps: acquiring a weight specification of weight data stored in a memory, comparing the weight specification with a specification of a write memory in terms of size and determining a comparison result; according to the comparison result, dividing the write memory into a first-in first-out write memory and a multiplexing write memory; according to the comparison result, determining data reading policies of the first-in first-out write memory and the multiplexing write memory; and according to the data reading policies, reading weights from the first-in first-out write memory and the multiplexing write memory and loading the weights to a calculation circuit. The technical solution provided by the present application has the advantages of low power consumption and short calculation time.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims the benefit of priority from Chinese PatentApplication NO. 201711484341.0 entitled “DATA TRANSMISSION METHOD FORNEURAL NETWORK AND RELATED PRODUCTS” and filed on Dec. 29, 2017, thecontent of which is hereby incorporated in its entire by reference.

FIELD

The present disclosure relates to the field of artificial intelligence(AI) technology, and more particularly, to a data transmission methodand a calculation apparatus for a neural network, an electronicapparatus, a computer-readable storage medium and a computer programproduct.

BACKGROUND

With the increasing maturity of artificial intelligence technology,application scenarios and product demands of all walks of life showexplosive growth, which makes applications of artificial intelligencebased on deep learning algorithms widely used. However, in the processof hardware and chip implementation, the deep learning algorithmsgenerally encounter a key problem: a parameter value of network model,namely WEIGHT, is generally at the level of tens of thousands bits tomegabits, that is, the WEIGHT generally ranges from tens of thousands ofbit to megabits. So how to effectively load the WEIGHT to a calculationcircuit for calculation has become an urgent problem to be solved atpresent. The existing technical solution cannot realize fasttransmission for weight data, which will affect calculation performance,prolong calculation time and reduce user experience.

SUMMARY

Embodiments of the present disclosure provide a data transmission methodfor a neural network and related products, which can reuse weight datafor many time, reduce weight loading times, save data transmissionbandwidth and save calculation time.

A first aspect, one embodiment of the present disclosure provides a datatransmission method for a neural network, the data transmission methodincludes the following steps:

acquiring a weight specification of weight data stored in a memory,comparing the weight specification with a specification of a writememory in terms of size, to obtain a comparison result;

dividing the write memory into a first-in first-out write memory and amultiplexing write memory, according to the comparison result;

determining data reading policies of the first-in first-out write memoryand the multiplexing write memory, according to the comparison result;

reading weights from the first-in first-out write memory and themultiplexing write memory, according to the data reading policies, andloading the weights to a calculation circuit.

Optional, the data reading policies include: a first-in first-out modeor a combination mode of first-in first-out and repeated reading.

Optional, the step of dividing the write memory into a first-infirst-out write memory and a multiplexing write memory, according to thecomparison result, and the step of determining data reading policies ofthe first-in first-out write memory and the multiplexing write memory,according to the comparison result, include:

if the comparison result is: the weight specification<=the specificationof the write memory,

configuring a start address of the first-in first-out write memory to bezero, and an end address of the first-in first-out write memory to bethe maximum address value of the write memory;

determining a first direct memory and a second direct memory been in afirst-in first-out mode;

writing the weights to the memory by the first direct memory, detectinga first-in first-out operation level by the second direct memory afterthe second direct memory is started, and reading the weights from thewrite memory and transmitting the weights to the calculation circuit bythe second direct memory, when the first-in first-out operation levelchanges from a high level to a low level.

Optional, the step of dividing the write memory into a first-infirst-out write memory and a multiplexing write memory, according to thecomparison result, and the step of determining data reading policies ofthe first-in first-out write memory and the multiplexing write memory,according to the comparison result, include:

if the comparison result is: the weight specification>the specificationof the write memory, dividing the weights into first part weights andsecond part weights;

placing the first part weights in an area of the multiplexing writememory, wherein, the area of the multiplexing write memory is a writememory area that is repeatedly read and used by the second directmemory;

starting a first task and a second task by the second direct memory,wherein, the first task is configured to repeatedly acquire the firstpart weights from the area of the multiplexing write memory, and thesecond task is configured to dynamically load the second part weightsfrom the memory through the area of the first-in first-out write memoryby use of a first-in first-out mode;

performing calculation by use of the first part weights firstly and thenperforming calculation by use of the second part weights by thecalculation circuit.

A second aspect, one embodiment of the present disclosure provides acalculation apparatus, and the calculation apparatus includes: a memory,a first direct memory, a write memory, a second direct memory, acalculation circuit, and a control circuit; wherein, the first directmemory is configured to write weights stored in the memory to the writememory;

the control circuit is configured to acquire a weight specification ofweight data stored in the memory, compare the weight specification witha specification of the write memory in terms of size, to obtain acomparison result, divide the write memory into a first-in first-outwrite memory and a multiplexing write memory, according to thecomparison result, and determine data reading policies of the first-infirst-out write memory and the multiplexing write memory, according tothe comparison result;

the second direct memory is configured to read weights from the first-infirst-out write memory and the multiplexing write memory, according tothe data reading policies, and load the weights to the calculationcircuit.

Optional, the data reading policies include: a first-in first-out modeor a combination mode of first-in first-out and repeated reading.

Optional, if the comparison result is: the weight specification<=thespecification of the write memory, the control circuit is furtherconfigured to configure a start address of the first-in first-out writememory to be zero, and an end address of the first-in first-out writememory to be the maximum address value of the write memory; determine afirst direct memory and a second direct memory been in a first-infirst-out mode;

the second direct memory detects a first-in first-out operation levelafter the second direct memory is started, and when the first-infirst-out operation level changes from a high level to a low level, thesecond direct memory reads the weights from the write memory andtransmits the weights to the calculation circuit.

Optional, if the comparison result is: the weight specification>thespecification of the write memory, the control circuit is furtherconfigured to divide the weights into first part weights and second partweights;

the first direct memory is specifically configured to place the firstpart weights in an area of the multiplexing write memory and cache thesecond part weights in an area of the first-in first-out write memory;

the area of the multiplexing write memory is a write memory area that isrepeatedly read and used by the second direct memory, and the area ofthe first-in first-out write memory is a write memory area that isdynamically loaded by the second direct memory;

the second direct memory is configured to start a first task and asecond task, wherein the first task is used to repeatedly acquire thefirst part weights from the area of the multiplexing write memory, andthe second task is used to dynamically load the second part weights fromthe memory through the area of the first-in first-out write memory byuse of the first-in first-out mode, and

the calculation circuit is configured to perform calculation by use ofthe first part weights firstly, and then perform calculation by use ofthe second part weights.

A third aspect, one embodiment of the present disclosure provides anelectronic apparatus, and the electronic apparatus includes thecalculation apparatus provided in the second aspect.

A fourth aspect, one embodiment of the present disclosure provides acomputer-readable storage medium, on which computer programs are storedfor electronic data interchange, the computer programs enable a computerto perform the data transmission method provided in the first aspect.

A fifth aspect, one embodiment of the present disclosure provides acomputer program product, including a non-transient computer-readablestorage medium in which computer programs are stored, and the computerprograms enable a computer to perform the data transmission methodprovided in the first aspect.

Embodiments of the present disclosure have the following beneficialeffects:

as can be seen from the above technical solution, the present disclosurecan compare the weight specification with the specification of the writememory in terms of size to obtain the comparison result, and then on thebasis of the comparison result determine how to divide the write memoryand determine the data reading policies, so that the data readingpolicies of weight can be dynamically adjusted according to the weightspecification. Thus, the technical solution provided by the presentdisclosure can improve reading speed, reduce calculation time, savepower consumption and improve user experience.

BRIEF DESCRIPTION OF THE DRAWINGS

In order to more clearly understand the technical solution hereinafterin embodiments of the present disclosure, a brief description to thedrawings used in detailed description of embodiments hereinafter isprovided thereof. Obviously, the drawings described below are someembodiments of the disclosure, for persons of ordinary skills in thisfield, other drawings can be obtained according to the drawings below onthe premise of no creative work.

FIG. 1 is a block diagram of an electronic apparatus provided in oneembodiment of the present disclosure.

FIG. 2 is a schematic diagram of a neural network model provided in oneembodiment of the present disclosure.

FIG. 3 is a flowchart of a data transmission method for a neural networkprovided in one embodiment of the present disclosure.

FIG. 4 is a block diagram of a calculation apparatus provided in oneembodiment of the present disclosure.

FIG. 5 is a block diagram of a calculation apparatus provided in anotherembodiment of the present disclosure.

DETAILED DESCRIPTION

Reference will now be made in detail to embodiments, examples of whichare illustrated in the accompanying drawings. In the following detaileddescription, numerous specific details are set forth in order to providea thorough understanding of the subject matter presented herein. But itwill be apparent to one skilled in the art that the subject matter maybe practiced without these specific details. Based on the embodiments ofthe disclosure, all other embodiments obtained by persons of ordinaryskills in this field without creative work shall fall within theprotection scope of the present disclosure.

The terms “first”, “second”, “third” and “fourth” in specification,claims and drawings of the present disclosure are used to distinguishdifferent objects, but not used to describe a particular sequence. Inaddition, the terms “include” and “have” and any deformation thereof areintended to cover exclusive inclusions. For example, a process, method,system, product or device that contains a series of steps or units isnot only limited to listed steps or units, but optionally includes stepsor units not listed, or optionally includes other inherent steps orunits for the processes, methods, products or devices.

The reference to “embodiments” in present disclosure means specificcharacteristics, structures or characters described in the embodimentscan be included in at least one embodiment of the present disclosure.The term “embodiment” shown in various positions in the specificationdoes not necessarily refer to the same embodiment, also does not referto the embodiments that are independent and exclusive embodiments withother embodiments or alternative embodiments. It can be understood bothexplicitly and implicitly by persons of ordinary skills in this fieldthat the embodiments described herein can be combined with otherembodiment.

An electronic apparatus described in the embodiments of the disclosuremay include: a server, a smart camera, a smart phone (such as an Androidphone, a iOS phone, a Windows Phone, etc.), a tablet computer, ahandheld computer, a laptop, a mobile internet device (MID) or awearable device, etc., which is only an example, not exhaustive, and isnot limited the electronic apparatus listed above. For the sake ofdescription, the electronic apparatus mentioned above is referred to asa User equipment (UE), a terminal or an electronic device in thefollowing embodiments. Of course, in practical applications, theabove-mentioned electronic apparatus is not limited to the aboverealization forms. For example, it can also include: an intelligentvehicle-mounted terminal, a computer equipment, and so on.

Referring to FIG. 1, FIG. 1 is a block diagram of an electronicapparatus provided in one embodiment of the present disclosure. Theelectronic apparatus can include: a processor 101, a memory 102 and aneural network chip 103. The processor 101 is electrically connected tothe memory 102 and the neural network chip 103. Specifically, in anoptional technical solution, the neural network chip 103 can beintegrated into the processor 101. The memory 102 can include: a flashdisk, a Read-Only Memory (ROM), a Random Access Memory (RAM), etc. Thetechnical solution provided by the present disclosure does not limitwhether the neural network chip 102 is set up separately or integratedinto the processor 101. That is, the neural network chip 102 can be setup separately or be integrated into the processor 101, which is notlimited in the technical solution of the present disclosure.

Referring to FIG. 2, which provides a schematic diagram of a neuralnetwork model provided in one embodiment of the present disclosure. Asshown in FIG. 2, values of WEIGHTS of each neural network model can alsobe referred to as weights, and the weights basically determine acomputational complexity of the neural network model. Inputs of theneural network model includes two channels, one is the weight (such asFilter shown in FIG. 2), the other is input data (such as Input Imageshown in FIG. 2). One output of the neural network model is output data(such as Output Image shown in FIG. 2). As for the weights, since theweights are generally multi-dimensional data, and the weights generallyrange in size from tens of Kilobytes (KB) to Megabyte (MB). Therefore,how to effectively load the weights and reduce waiting time of acalculation circuit has become an important problem to be solved by theneural network at present.

In one embodiment, the neural network model can include many layers ofcalculations. Each layer of calculation may include, for example, amatrix multiplication by matrix operation, a convolution operation, andother complex operations. A calculation solution of the neural networkmodel is introduced in the following contents. In detail, thecalculation solution be divided into several layers of calculations, andeach layer of calculation is an operation between the input data and theweight of a corresponding layer, namely, an operation between the InputImage and the Filter as shown in FIG. 2. The operation may include, butnot limited to: convolution operations, matrix multiplication by matrixoperations, and so on. The schematic diagram shown in FIG. 2 can be aconvolution operation at a certain layer of the neural network model,specifically:

the Filters represent the weights in the neural network model;

the Input Image represents the input data of the present disclosure;

the Output Image represents the output data of the present disclosure;

each CO can be obtained by adding all products of each input data beingmultiplied by a corresponding weight.

A number of the weights is CI NUM*C0 NUM, and each weight is atwo-dimensional matrix data structure.

Referring to FIG. 3, FIG. 3 is a flowchart of a data transmission methodfor a neural network provided in one embodiment of the presentdisclosure, which can be executed by a neural network chip or aprocessing chip as shown in FIG. 4. A chip structure in FIG. 4 includes:

a memory, such as a double rate synchronous dynamic RAM (DDR),configured to load parameters of a neural network, when a system ispowered up;

a first direct memory, configured to load and transmit the parameters ofthe neural network from the DDR to a cache of a write memory (WM);

a second direct memory, configured to obtain the parameters of theneural network from the cache of the write memory and send theparameters to a calculation circuit;

the calculation circuit, configured for calculation;

a first-in first-out write memory: a First Input First Output (FIFO)area of the write memory;

a multiplexing write memory: a non-FIFO area of the write memory;

a write memory controller: a controller and a arbiter of the writememory.

In a multiplexing write memory mode, a handshake granularity between thefirst direct memory and the second direct memory is a task-level, whichcan be described as follows:

after the first direct memory writes all the data that needs to betransferred by a task into the multiplexing write memory, and transmitsthe data to the second direct memory through an interrupt; the seconddirect memory begins to read the data. Of course, there is a directhardware handshake between the first direct memory and the second directmemory, which does not require software synchronization.

In the multiplexing write memory mode, a maximum amount of data that canbe transferred, is equal to a specification of the write memory, when anamount of data is greater than the specification of the write memory,data that is transferred subsequently will overwrite data transferredpreviously.

In a first-in first-out write memory mode, the handshake granularitybetween the first direct memory and the second direct memory is in aread-write address pointer level, which can be described as follows:

the first direct memory is configured to write data; the second directmemory is configured to detect whether there is any line of valid datawritten into the first direct memory, and begin to read data, withoutwaiting for the first direct memory to write all the data, when thesecond direct memory detects there is any line of valid data writteninto the first direct memory.

In the first-in first-out write memory mode, data with amount greaterthan the specification of the write memory can be transferred only byuse of a limited storage space, and the second direct memory and thefirst direct memory performs handshake by use of a first-in first-outoperation level, to ensure that the data is not overwritten and lost.

The data transmission method for the neural network as shown in FIG. 3at least includes the following steps:

step S301: acquiring a weight specification of weight data stored in amemory;

step S302: comparing the weight specification with a specification of awrite memory in terms of size, to obtain a comparison result;

step S303: dividing the write memory into a first-in first-out writememory and a multiplexing write memory, and determining data readingpolicies of the first-in first-out write memory and multiplexing writememory, according to the comparison result;

step S304: reading weights from the first-in first-out write memory andthe multiplexing write memory, according to the data reading policies,and loading the weights to a calculation circuit.

The data transmission method for the neural network provided in thepresent disclosure can compare the weight specification with thespecification of the write memory in terms of size, to obtain thecomparison result, and then on the basis of the comparison resultdetermine how to divide the write memory and determine the data readingpolicies, so that the data reading policies of weight can be dynamicallyadjusted according to the weight specification. Thus, the technicalsolution provided by the present disclosure can improve reading speed,reduce calculation time, save power consumption and improve userexperience.

For example, the comparison result can be: the weight specification<=thespecification of the write memory, namely, the weight specificationbeing less than or equal to the specification of the write memory.

Based on the above comparison result, the weights can be cached in thewrite memory for repeated use, and only a first task is performed, theweights need to be loaded once from the DDR, to perform a first load.

A start address of the first-in first-out write memory can be configuredto be zero, and an end address of the first-in first-out write memorycan be configured to the maximum address value of the write memory.

The first direct memory and the second direct memory can be configuredas the first-in first-out mode by use of software configuration, and thefirst direct memory and the second direct memory are started.

The first direct memory can transmit all the weights into the first-infirst-out write memory and the multiplexing write memory.

After the second direct memory is started, the second direct memoryfirst waits for the first-in first-out operation level to be pulleddown. Once the first direct memory starts to write data, the seconddirect memory starts to validly read the data, and transmits the data tothe calculation circuit, and the calculation circuit starts calculation.

After the first load is completed, the following tasks do not need to bereloaded from the DDR, and the second direct memory only needs to readand use the data repeatedly, which reduces the number of times that thetask is reloaded from the DDR, therefore saving transmission bandwidth.

In this scenario, compared to the multiplexing write memory mode, in thefirst-in first-out write memory mode, the first task can be startedearlier. As long as a valid parameter is written to the first directmemory, the second direct memory/the calculation circuit can start tooperate. In the multiplexing write memory mode, the second directmemory/the calculation circuit does not start to operate, until thefirst direct memory has finished transferring all the weights.

For example, the comparison result can be: the weight specification>thespecification of the write memory, namely, the weight specificationbeing greater than the specification of the write memory.

At this point, all the weights cannot be cached in the write memory.When the weights need to be reused, all the weights cannot be reused.The existing technical solution needs to load the weights dynamicallyfrom the DDR every time, which has a great impact on bandwidth demandand performance.

The write memory is divided into two areas for use, namely, the first-infirst-out write memory and the multiplexing write memory. The weightsare divided into two parts: first part weights and second part weights.The combination of the first part weights and the second part weights isthe weights.

The first part weights are positioned in the area of the multiplexingwrite memory, and are only loaded once from the DDR, and are repeatedlyread and used by the second direct memory.

The second part weights are dynamically loaded to the area of thefirst-in first-out write memory. The second part weights need to bedynamically loaded from the DDR every time.

At the beginning of each calculation task, the second direct memorydirectly acquires the first part weights from the area of themultiplexing write memory, without any wait. At the same time, thesecond part weights are loaded by the first-in first-out write memory.

Therefore, each task of the second direct memory will activate twosecond direct memory tasks respectively: task 1 and task 2. The task 1is a non first-in first-out mode, which is responsible for acquiring thefirst part weights from the area of the multiplexing write memory. Thetask 2 is the first-in first-out mode, which is responsible foracquiring the second part weights from the first-in first-out writememory.

The larger the area of the multiplexing write memory is, the more thefirst part weights can be reused, and the more the bandwidth is saved.Of course, in practice use, a reasonable first-in first-outspecification of the write memory can be set in combination with thebandwidth of the DDR. The first-in first-out specification of the writememory can resist jitter and imbalance of the bandwidth of the DDR, andtry to make the first-in first-out write memory non-empty.

The following is an example of an actual weight specification toillustrate the specific solution.

Assuming that the weight specification=1.2 times of the specification ofthe write memory, that is, assuming that the weight specification isequal to 1.2 times of the specification of the write memory, then 80%area of the write memory is set to the multiplexing write memory, andthe remaining 20% area of the write memory is set to the first-infirst-out write memory.

Then, the multiplexing write memory can store 66.7% of the weights, thatis, the first part weights are equal to ⅔ of the weights.

The remaining 33.3% weights are loaded in the first-in first-out writememory, that is, the second part weights are determined to be equal to ⅓of the weights.

Every time a task is started, the second direct memory reads the firstpart weights firstly. Specifically, the second direct memory readsparameters of the first part weights stored in the area of themultiplexing write memory firstly, and transmits the first part weightsto the calculation circuit for calculation. At the same time, theremaining weights are dynamically loaded in the first-in first-out writememory, and then the second direct memory reads the second part weightsand transmits the second part weights to the calculation circuit forcalculation.

Therefore, each task can save 66.7% of the bandwidth, and this ratio canbe adjusted and optimized according to different model conditions andbandwidth conditions.

Therefore, the technical solution provided by this disclosure has thefollowing advantages: the write memory with small storage capacity, canstore model parameters of various sizes; the multiplexing write memorycan be reused, which can save bandwidth effectively and give prefetchtime for the first-in first-out write memory; and the first-in first-outwrite memory can perform the hardware handshake with a valid datagranularity, which can completely eliminate wait overhead of weightparameters.

Handshake mechanism, namely communication modes, between varioushardware is described below. Of course, the following handshakemechanism is only for illustration. It should be understood that theembodiments of this disclosure are not limited to the followinghandshake mechanism.

A handshake mechanism between the write memory and the first directmemory/the second direct memory can includes:

handshake signals: a request signal (REQ), a received signal (READY), avalid indication (VALID).

The request signal (REQ) indicates a read and write request signal,which is valid at a high level (such as, logic 1).

The received signal==1 indicates that the write memory can receive therequest signal; the received signal==0 indicates that the write memorycannot receive the request signal at present.

The valid indication indicates a read and write data valid indication,which is valid at a high level (such as, logic 1).

A handshake mode between the write memory controller and the firstdirect memory is as follows:

a request signal is transmitted from the first direct memory to thewrite memory controller;

the write memory controller provides a received signal according toarbitration condition;

if the received signal==0, then the first direct memory needs to keep acurrent request signal unchanged until the received signal==1;

if the received signal==1, then the first direct memory can continue tosend a next request signal;

when the data is written to the memory, the write memory controller willreturn a valid indication to the first direct memory;

the first direct memory receives the valid indication, and can determinethat the data has been written to the memory according to the validindication.

A handshake mode between the write memory and the second direct memoryis as follows:

a request signal is transmitted from the second direct memory to thewrite memory controller;

the write memory controller provides the received signal according tothe arbitration condition;

if the received signal==0, the second direct memory needs to keep thecurrent request signal unchanged until the received signal==1;

if the received signal==1, the second direct memory can continue to senda next request signal;

when the data is ready, the write memory controller will return a validindication signal to the second direct memory;

the second direct memory acquires valid data when receiving the validindication signal.

A hardware handshake mode for the area of the first-in first-out writememory is as follows:

the first direct memory and the second direct memory are configured in afirst-in first-out mode by software;

the write memory controller is configured to synthetically arbitrate therequest signal of the first direct memory/the second direct memory; thearea of the first-in first-out write memory and the write memorycontroller also need to additionally iterate an empty full arbitrationlogic of first-in first-out;

when the write memory controller detects that the first-in first-outoperation level is pulled up, the received signal is set to 0, namely,the received signal=0, the write memory controller refuses to receive anew request signal from the first direct memory until the first-infirst-out operation level is pulled down;

when the write memory controller detects that the first-in first-outoperation level is pulled up, the received signal is set to 0, namely,the received signal=0, the write memory controller refuses to receive anew request signal from the second direct memory until the first-infirst-out operation level is pulled down (i.e., from a high level to alow level);

the first-in first-out write memory is empty by default; when the seconddirect memory initiates a read access firstly, because the first-infirst-out operation level is valid, and the received signal==0,according to a handshake protocol, the second direct memory is requiredto keep the request signal on a bus all the time; until the first directmemory is written validly, the first-in first-out operation level ispulled down, and the received signal=1, then the second direct memorybegins to read the valid data;

if the first direct memory continues to write data, and the seconddirect memory reads data relatively late or slow, resulting in thefirst-in first-out operation level to be pulled up, then according tothe handshake protocol, the first direct memory is required to keep thecurrent request signal on the bus all the time; until the second directmemory reads the data, the first-in first-out operation level is pulleddown, and the received signal=1, then the first direct memory cancontinue to write the data.

In the first-in first-out mode, the handshake granularity of the seconddirect memory and the first direct memory is a valid data, and thesecond direct memory and the first direct memory can form a pipelineaccess of the handshake granularity, which can reduce waiting cost ofsoftware scheduling and parameter transfer.

Referring to FIG. 5, FIG. 5 is a block diagram of a calculationapparatus provided in another embodiment of the present disclosure. Thecalculation apparatus includes: a memory 501, a first direct memory 502,a write memory 503, a second direct memory 504, a calculation circuit505, and a control circuit 506.

The first direct memory 502 is configured to write weights stored in thememory 501 to the write memory 503.

The control circuit 506 is configured to acquire a weight specificationof weight data stored in the memory 501, compare the weightspecification with a specification of the write memory 503 in terms ofsize, to obtain a comparison result, divide the write memory 503 into afirst-in first-out write memory and a multiplexing write memoryaccording to the comparison result, and determine data reading policiesof the first-in first-out write memory and the multiplexing writememory, according to the comparison result.

The second direct memory 504 is configured to read weights from thefirst-in first-out write memory and the multiplexing write memory,according to the data reading policies, and load the weights to thecalculation circuit 505.

In one embodiment, the data reading policies can include:

a first-in first-out mode or a combination mode of first-in first-outand repeated reading.

In one embodiment, the control circuit 506 is further configured to, forexample, when the comparison result is: the weight specification<=thespecification of the write memory 503, configure a start address of thefirst-in first-out write memory to be zero, and an end address of thefirst-in first-out write memory to be the maximum address value of thewrite memory 503, and determine the first direct memory 504 and thesecond direct memory 502 been in a first-in first-out mode.

After the second direct memory 502 is started, the second direct memory502 detects the first-in first-out operation level, and when thefirst-in first-out operation level changes from a high level to a lowlevel, the second direct memory 502 reads the weights from the writememory 503 and transmits the weights to the calculation circuit 505.

In one embodiment, the control circuit 506 is further configured to, forexample, when the comparison result is: the weight specification>thespecification of the write memory 503, divide the weights into firstpart weights and second part weights.

The first direct memory 504 is specifically configured to place thefirst part weights in an area of the multiplexing write memory and cachethe second part weights in an area of the first-in first-out writememory.

The area of the multiplexing write memory is a write memory area thatcan be repeatedly read and used by the second direct memory, and thearea of the first-in first-out write memory is a write memory area thatcan be dynamically loaded by the second direct memory 502.

The second direct memory 502 is configured to start a first task and asecond task. The first task is used to repeatedly acquire the first partweights from the area of the multiplexing write memory, and the secondtask is used to dynamically load the second part weights from the memory501 through the area of the first-in first-out write memory by use ofthe first-in first-out mode.

The calculation circuit 505 is configured to perform calculation by useof the first part weights firstly, and then perform calculation by useof the second part weights.

Embodiments of the present disclosure further provide an electronicapparatus including the calculation apparatus mentioned above.

Embodiments of the present disclosure further provide acomputer-readable storage medium in which computer programs are storedfor electronic data interchange, and the computer programs enable acomputer to perform some or all steps of any of the data transmissionmethod for a neural network as described in method embodiments mentionedabove.

Embodiments of the present disclosure further provide a computer programproduct including a non-transient computer-readable storage medium inwhich computer programs are stored, and the computer programs enable acomputer to perform some or all steps of any of the data transmissionmethod for a neural network as described in method embodiments mentionedabove.

It should be noted that, for the method embodiments mentioned above, fora brief description, therefore, the method embodiments above areexpressed as a series of action combinations, but a person havingordinary skills in the field should be aware that, this application isnot limited by action sequences described above, because according tothis application, some steps may be taken in other orders orsimultaneously. Secondly, a person having ordinary skills in the fieldshould also be aware that the embodiments described in the specificationare optional embodiments and the actions and modules involved are notnecessary for this application.

In the above embodiments, the description of each embodiment has its ownemphasis, and parts not specified in one embodiment can be referred tothe relevant description of other embodiments.

It should be understood that the disclosed apparatus in the embodimentsprovided by the present disclosure can be implemented in other ways. Forexample, the apparatus embodiments described above are merely schematic;for example, the division of the modules is merely a division of logicalfunctions, which can also be realized in other ways; for example,multiple units or components can combined or integrated into anothersystem, or some features can be ignored or not implemented. On the otherhand, the coupling, direct coupling or communication connection shown ordiscussed may be achieved through some interfaces, indirect coupling orcommunication connection between devices or units may electrical orotherwise.

The modules described as separate parts may be or may not be physicallyseparated, and the assembly units that serve as display modules may ormay not be physical units, that is, they may be located in one place, orthey may be distributed over multiple network units.

In addition, each functional module in each embodiment of the disclosuremay be integrated into a processing unit, or each unit can alsophysically exist separately, or two or more units can also be integratedinto a unit. The integrated unit mentioned above can be realized eitherin the form of hardware or in the form of hardware and softwarefunctional modules.

The integrated units may be stored in a computer-readable memory ifimplemented as a software program module and sold or used as a separateproduct. Based on this understanding, in nature, the technical solutionsof the application or the part that contributes to the existingtechnology, or all or part of the technical solution can be manifestedin the form of software products, the computer software products storedin a memory, including several instructions to make a computer equipment(such as a personal computer, a server or a network equipment, etc.) toperform all or part of the steps of the method described in eachembodiment of this application. The aforementioned memory includes: aUSB flash drive, a ROM (Read-Only Memory), a RAM (Random Access Memory),a mobile hard disk drive, a diskette or a CD-ROM or other storage mediumthat can store program codes.

A person having ordinary skills in the field can understand that all orpart of steps in various method described in the embodiments of thisapplication can be executed by corresponding hardware commanded by theprograms. The programs can be stored in a computer-readable storage, andthe computer-readable storage can include: a flash drive, a Read-OnlyMemory (ROM), a Random Access Memory (RAM), a disk or a compact disk(CD), etc.

Although the disclosure is described in combination with specificfeatures and embodiments, it is evident that it can be modified andcombined in various ways without departing from the spirit and scope ofthe disclosure. Accordingly, this specification and accompanyingdrawings are only exemplary descriptions of the disclosure as defined bythe claims and are deemed to cover any and all modifications,variations, combinations or equivalents within the scope of thedisclosure. The foregoing descriptions are merely exemplary embodimentsof the present disclosure, but not intended to limit the protectionscope of the present disclosure. Any variation or replacement made bypersons of ordinary skills in the art without departing from the spiritof the present disclosure shall fall within the protection scope of thepresent disclosure. Therefore, the scope of the present disclosure shallbe subject to be appended claims.

1. A data transmission method for a neural network, comprising: acquiring a weight specification of weight data stored in a memory, comparing the weight specification with a specification of a write memory in terms of size, to obtain a comparison result; dividing the write memory into a first-in first-out write memory and a multiplexing write memory, according to the comparison result; determining data reading policies of the first-in first-out write memory and the multiplexing write memory, according to the comparison result; reading weights from the first-in first-out write memory and the multiplexing write memory, according to the data reading policies, and loading the weights to a calculation circuit.
 2. The data transmission method of claim 1, wherein, the data reading policies comprise: a first-in first-out mode or a combination mode of first-in first-out and repeated reading.
 3. The data transmission method of claim 1, the step of dividing the write memory into a first-in first-out write memory and a multiplexing write memory, according to the comparison result, and the step of determining data reading policies of the first-in first-out write memory and the multiplexing write memory, according to the comparison result, comprising: if the comparison result is: the weight specification<=the specification of the write memory, configuring a start address of the first-in first-out write memory to be zero, and an end address of the first-in first-out write memory to be the maximum address value of the write memory; determining a first direct memory and a second direct memory been in a first-in first-out mode; writing the weights to the memory by the first direct memory, detecting a first-in first-out operation level by the second direct memory after the second direct memory is started, and reading the weights from the write memory and transmitting the weights to the calculation circuit by the second direct memory, when the first-in first-out operation level changes from a high level to a low level.
 4. The data transmission method of claim 1, the step of dividing the write memory into a first-in first-out write memory and a multiplexing write memory, according to the comparison result, and the step of determining data reading policies of the first-in first-out write memory and the multiplexing write memory, according to the comparison result, comprising: if the comparison result is: the weight specification>the specification of the write memory, dividing the weights into first part weights and second part weights; placing the first part weights in an area of the multiplexing write memory, wherein, the area of the multiplexing write memory is a write memory area that is repeatedly read and used by the second direct memory; starting a first task and a second task by the second direct memory, wherein, the first task is configured to repeatedly acquire the first part weights from the area of the multiplexing write memory, and the second task is configured to dynamically load the second part weights from the memory through the area of the first-in first-out write memory by use of a first-in first-out mode; performing calculation by use of the first part weights firstly and then performing calculation by use of the second part weights by the calculation circuit.
 5. A calculation apparatus, comprising: a memory, a first direct memory, a write memory, a second direct memory, a calculation circuit, and a control circuit; wherein, the first direct memory is configured to write weights stored in the memory to the write memory; the control circuit is configured to acquire a weight specification of weight data stored in the memory, compare the weight specification with a specification of the write memory in terms of size, to obtain a comparison result, divide the write memory into a first-in first-out write memory and a multiplexing write memory, according to the comparison result, and determine data reading policies of the first-in first-out write memory and the multiplexing write memory, according to the comparison result; the second direct memory is configured to read weights from the first-in first-out write memory and the multiplexing write memory, according to the data reading policies, and load the weights to the calculation circuit.
 6. The calculation apparatus of claim 5, wherein, the data reading policies comprise: a first-in first-out mode or a combination mode of first-in first-out and repeated reading.
 7. The calculation apparatus of claim 5, wherein, if the comparison result is: the weight specification<=the specification of the write memory, the control circuit is further configured to configure a start address of the first-in first-out write memory to be zero, and an end address of the first-in first-out write memory to be the maximum address value of the write memory; determine the first direct memory and the second direct memory as a first-in first-out mode; the second direct memory detects a first-in first-out operation level after the second direct memory is started, and when the first-in first-out operation level changes from a high level to a low level, the second direct memory reads the weights from the write memory and transmits the weights to the calculation circuit.
 8. The calculation apparatus of claim 5, wherein, if the comparison result is: the weight specification>the specification of the write memory, the control circuit is further configured to divide the weights into first part weights and second part weights; the first direct memory is specifically configured to place the first part weights in an area of the multiplexing write memory and cache the second part weights in an area of the first-in first-out write memory; the area of the multiplexing write memory is a write memory area that is repeatedly read and used by the second direct memory, and the area of the first-in first-out write memory is a write memory area that is dynamically loaded by the second direct memory; the second direct memory is configured to start a first task and a second task, wherein, the first task is used to repeatedly acquire the first part weights from the area of the multiplexing write memory, and the second task is used to dynamically load the second part weights from the memory through the area of the first-in first-out write memory by use of the first-in first-out mode; and the calculation circuit is configured to perform calculation by use of the first part weights firstly, and then perform calculation by use of the second part weights.
 9. An electronic apparatus, comprising a calculation apparatus, the calculation apparatus comprising: a memory, a first direct memory, a write memory, a second direct memory, a calculation circuit, and a control circuit; wherein, the first direct memory is configured to write weights stored in the memory to the write memory; the control circuit is configured to acquire a weight specification of weight data stored in the memory, compare the weight specification with a specification of the write memory in terms of size, to obtain a comparison result, divide the write memory into a first-in first-out write memory and a multiplexing write memory, according to the comparison result, and determine data reading policies of the first-in first-out write memory and the multiplexing write memory, according to the comparison result; the second direct memory is configured to read weights from the first-in first-out write memory and the multiplexing write memory, according to the data reading policies, and load the weights to the calculation circuit.
 10. (canceled)
 11. (canceled) 